Author Identification Using Different Sizes of Documents: A Summary
نویسنده
چکیده
In the present research work, we deal with the problem of authorship attribution of ancient Arabic text documents, which were written by several ancient philosophers. For that purpose, we conducted several authorship attribution experiments applied with different text sizes. A special dataset, called “A4P” (Authorship Attribution for Ancient Arabic Philosophers), has been constructed by extracting texts of different sizes from the books of those 5 ancient Arabic philosophers, where the genre and topic are quite similar. The size of the texts varies from 100 words to 3000 words per text. In our approach two types of features are employed; character N-grams and words and several classifiers are used, namely: SMO based SVM, Multi Layer Perceptron, Linear Regression, Stamatatos distance and Manhattan distance. Results show that the minimum required text size (for getting good authorship attribution performances) depends on the used features and classification technique, but in the overall the performances of the proposed techniques are quite interesting.
منابع مشابه
Local n-grams for Author Identification Notebook for PAN at CLEF 2013
Our approach to the author identification task uses existing authorship attribution methods using local n-grams (LNG) and performs a weighted ensemble. This approach came in third for this year’s competition, using a relatively simple scheme of weights by training set accuracy. LNG models create profiles, consisting of a list of character n-grams that best represent a particular author’s writin...
متن کاملIntroducing and Analyzing Two Historical Documents about the Development of Tehran at the Reign of Nasiruddin Shah
Tehran has changed a lot at the time of Nasiruddin Shah. The changes began from the destruction of Tahmāsbī fortress and constructing a new one and development of the city in 1867. These were done because the town was small; therefore, Nasiruddin Shah ordered to make the changes. There is a very few information about the formation of the development and its details. The existing data can be ext...
متن کاملIntroducing and Analyzing Two Historical Documents about the Development of Tehran at the Reign of Nasiruddin Shah
Tehran has changed a lot at the time of Nasiruddin Shah. The changes began from the destruction of Tahmāsbī fortress and constructing a new one and development of the city in 1867. These were done because the town was small; therefore, Nasiruddin Shah ordered to make the changes. There is a very few information about the formation of the development and its details. The existing data can be ext...
متن کاملA Study on Author Identification through Stylometry
Electronic communication is one of the popular ways of communication in this era. E-mail communication is the most popular way of electronic communication. Internet works as the backbone for these communications. In digital forensics, questions is arises that the authors of documents and the author identity, demographic background is linked to other documents or not. So identification of the au...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015